Skip to content

Conversation

@littlebullGit
Copy link
Contributor

@littlebullGit littlebullGit commented Oct 25, 2025

What does this PR do?

Addresses comment: #21309 (comment)

Summary


📚 Documentation preview 📚: https://pytorch-lightning--21313.org.readthedocs.build/en/21313/

@github-actions github-actions bot added docs Documentation related fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package labels Oct 25, 2025
@littlebullGit littlebullGit force-pushed the fix/process-safe-port-manager branch from 219bc75 to c48259e Compare October 25, 2025 15:58
@codecov
Copy link

codecov bot commented Oct 25, 2025

Codecov Report

❌ Patch coverage is 92.93478% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 87%. Comparing base (a883890) to head (3612cc4).
⚠️ Report is 8 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #21313    +/-   ##
========================================
- Coverage      87%      87%    -1%     
========================================
  Files         270      272     +2     
  Lines       23822    24150   +328     
========================================
+ Hits        20795    20950   +155     
- Misses       3027     3200   +173     

@SkafteNicki
Copy link
Collaborator

@littlebullGit PR is looking good, can you please check the failing check and see if it just a random failure (it seemed to work before merging master)

@littlebullGit
Copy link
Contributor Author

@littlebullGit PR is looking good, can you please check the failing check and see if it just a random failure (it seemed to work before merging master)

@SkafteNicki , the error seems not related. It failed to start a server within 20 sec. Maybe just try again.
E Exception: The server didn't start within 20 seconds.
../../.venv/lib/python3.9/site-packages/pytorch_lightning/serve/servable_module_validator.py:111: Exception
FAILED serve/test_servable_module_validator.py::test_servable_module_validator - Exception: The server didn't start within 20 seconds.

FAILED serve/test_servable_module_validator.py::test_servable_module_validator - Exception: The server didn't start within 20 seconds.
= 1 failed, 3139 passed, 564 skipped, 9 xfailed, 5329 warnings, 3 rerun in 1101.09s (0:18:21) =
Error: Process completed with exit code 1.

@littlebullGit littlebullGit force-pushed the fix/process-safe-port-manager branch from cc125d4 to 3612cc4 Compare October 30, 2025 02:40
@littlebullGit
Copy link
Contributor Author

@littlebullGit PR is looking good, can you please check the failing check and see if it just a random failure (it seemed to work before merging master)

@SkafteNicki , the error seems not related. It failed to start a server within 20 sec. Maybe just try again. E Exception: The server didn't start within 20 seconds. ../../.venv/lib/python3.9/site-packages/pytorch_lightning/serve/servable_module_validator.py:111: Exception FAILED serve/test_servable_module_validator.py::test_servable_module_validator - Exception: The server didn't start within 20 seconds.

FAILED serve/test_servable_module_validator.py::test_servable_module_validator - Exception: The server didn't start within 20 seconds. = 1 failed, 3139 passed, 564 skipped, 9 xfailed, 5329 warnings, 3 rerun in 1101.09s (0:18:21) = Error: Process completed with exit code 1.

relaxed the timeout check and retried the build. All passed now.

Copy link
Collaborator

@deependujha deependujha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @littlebullGit for the work on this. The intent to address the port-collision issue was spot-on. Really appreciate the deep effort here.

After reviewing the failures, it turned out the core problem was isolated to standalone tests not acquiring a free port. Since they're not part of the distributed runner workflow, a simpler fix in the standalone test setup resolves the issue without the additional infrastructure.

Proposed fix here: #21335

Would love your review and thoughts when you get a moment.

Thanks again for pushing on this and exploring the problem space so thoroughly. 🙌🏻⚡️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation related fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants